Parameter estimation on multivalent ITC data sets

The Wiseman fitting can be used to extract binding parameters from ITC data sets, such as heat of binding, number of binding sites, and the overall dissociation rate. The classical Wiseman fitting assumes a direct binding process and neglects the possibility of intermediate binding steps. In principle, it only provides thermodynamic information and not the kinetics of the process. In this article we show that a concentration dependent dissociation constant could possibly stem from intermediate binding steps. The mathematical form of this dependency can be exploited with the aid of the Robust Perron Cluster Cluster Analysis method. Our proposed extension of the Wiseman fitting rationalizes the concentration dependency, and can probably also be used to determine the kinetic parameters of intermediate binding steps of a multivalent binding process. The novelty of this paper is to assume that the binding rate varies per titration step due to the change of the ligand concentration and to use this information in the Wiseman fitting. We do not claim to produce the most accurate values of the binding parameters, we rather present a novel method of how to approach multivalent bindings from a different angle.


Parameter estimation on multivalent ITC data sets Franziska Erlekam, Maximilian Zumbansen & Marcus Weber *
The Wiseman fitting can be used to extract binding parameters from ITC data sets, such as heat of binding, number of binding sites, and the overall dissociation rate. The classical Wiseman fitting assumes a direct binding process and neglects the possibility of intermediate binding steps. In principle, it only provides thermodynamic information and not the kinetics of the process. In this article we show that a concentration dependent dissociation constant could possibly stem from intermediate binding steps. The mathematical form of this dependency can be exploited with the aid of the Robust Perron Cluster Cluster Analysis method. Our proposed extension of the Wiseman fitting rationalizes the concentration dependency, and can probably also be used to determine the kinetic parameters of intermediate binding steps of a multivalent binding process. The novelty of this paper is to assume that the binding rate varies per titration step due to the change of the ligand concentration and to use this information in the Wiseman fitting. We do not claim to produce the most accurate values of the binding parameters, we rather present a novel method of how to approach multivalent bindings from a different angle.
The Isothermal Titration Calorimetry (ITC) experiment, together with a method to extract binding parameters of binding interactions between a macromolecule and a ligand, was introduced by Wiseman et al. in 1989 1 . The assembly of this experiment is a reference cell containing a buffer solution, a main cell containing a fixed macromolecule concentration. Both cells are in an adiabatic jacket. The ligand concentration is titrated through a syringe into the main cell at fixed time points, evolving or absorbing heat during the binding process. In order to maintain the same temperature as in the reference cell, the main cell is heated or cooled down and the power used is being recorded. As the molar ratio between ligand and macromolecule increases with each injection, at some point saturation occurs and the heat evolved decreases. The classical ITC method would then connect the heat peaks to a sigmoidal curve. The curve slope at midpoint is K a . The midpoint itself is the stoichiometry.
Based on the ITC experiment, another method was introduced in 2012 by Burnouf et al. 2 , as well as Vander Meulen and Butcher 3 , which not only extracts thermodynamic information, but also binding kinetics from the heat flow. KinITC approximates the exponential curve of every injection of the isotherm after each peak with a least squares method. Finally, the k on of the binding process is determined by averaging the k on rates of all the titration steps.
The method presented in this paper uses the integrated peaks of the recorded heat flow and optimizes the kinetic binding parameters such that the deviation to the actual heat flow is minimal. This method is performed for both the Wiseman fitting and the Q c fitting. Again we show that the binding rates do differ for the first and subsequent bindings and unbindings. k on is ligand concentration dependent and changes thus over time.
First, the experimental binding rates for each injection can be extracted. The overall binding rates k on and k off are retrieved by calculating the average of those experimental binding rates. Their fraction defines the dissociation constant as well as the association constant Both are binding parameters extracted by the Wiseman method. The binding rates determine the binding affinity between two molecules. Having a molecule L and a molecule M, a single binding is described by  4 to retrieve microscopic binding rates through the experimental binding rates extracted via kinITC. Microscopic binding rates determine the binding affinity between the binding sites of two molecules which is not constant. This is the fundamental difference to the existing kinITC method: In the classical kinITC the heat curve of each injection is used to obtain a k on value which is averaged in the end to on overall k on rate of the binding process. For kinITC the shapes of the heat flow peaks are essential, whereas we only need th ethermodynamic information, because these reactions are concentration dependent. The key point is that there is a concentration dependent k on rate per titration step.
In 4 , the Robust Perron Cluster Algorithm (PCCA+) is used to coarse grain microscopic binding rates to modelled binding rates for each molar ratio, which are then used to fit the experimental binding rates. The transition states of complex binding processes can be modeled with row-stochastic rate matrices. PCCA+ is a spectral cluster method to reduce the dimensionality of these matrices. The result is a fuzzy clustering and each microstate gets assigned a certain membership to the two macro states: bound and unbound. In 5 Weber and Röblitz demonstrate that this robust method always delivers a fuzzy clustering for nearly uncoupled Markov chains with transition states. The advantage compared to other clustering methods is that for PCCA+ the process must not necessarily be reversible and the matrix does not need to be decomposable.
This shown concentration dependency of the paper 4 gave the inspiration for the following: Erlekam et al. showed an interdependance of k on to the ligand concentration. In this paper, we will approach the topic of concentration dependence from another angle. In this paper we assume a dependence of the dissocation rate k off and the ligand concentration.
The thermodynamic binding parameters are the enthalpy or heat of binding H • , which defines the amount of heat absorbed or produced for a binding, the entropy S • , which is a measure of dispersal of energy for a binding, and the Gibbs free energy G • , the amount of available energy, and the number of binding sites n. As stated in 1 , the heat of binding, the entropy, the Gibbs free energy, and the association constant are related through the following equation where R is the gas constant and T the absolute temperature in Kelvin.
In this paper, we will compare the resulting graphs of the Wiseman fitting with the proposed Q c fitting of the paper 4 over ITC data sets. For this proof of concept we used experimental ITC data from Igde et al. 6 . The receptors used therein were lectin Concanavalin A (Con A). Depending on pH of the assay, the receptor can be bivalent or tetravalent.The ligands were glycooligomers based on oligo(amidoamines) with pendant mannose side chains. For the bivalent fitting we use bivalent Con A with the bivalent macromolecule for the bivalent fitting. For the trivalent fitting we used the trivalent Con A and the tetravalent macromolecule. Thus, the maximal compound valency depends on the minimum valency of ligand and receptor. Because we are using the Wiseman function that hold for 1:1 stoichiometries in 1 , we also assume a 1:1 stoichiometry in our examples below.

The Wiseman fitting
Having an ITC data set of a binding between ligand L and macromolecule M with T ∈ N injections, and the respective sets of total concentrations L t = {l 1 , . . . , l T } and M t = {m 1 , . . . , m T } , the Wiseman function is defined as for l ∈ L t and m ∈ M t . By using the integrated peaks of the ITC data for each injection the Wiseman fitting obtains the association constant, the number of binding site and the enthalpy by minimizing the difference of the hat of binding and the Wiseman function scaled with the enthalpy. For this minimization the Frobenius norm is used: Note, that in this expression the term W is without physical unit. Thus, the unit of q trans will correspond to the physical unit of H • . Scaling q trans just scales H • . In order to be without physical unit, K a has to reverse the physical unit of m. A common rescaling of m and l only changes the physical unit of K a .

The Q c fitting
As the exisitng methods 2,4 suggest, it is valid for multivalent data sets to assume a connection between molar ratio and association or dissociation. With this in mind, a new fitting can be specified by using microscopic binding rates for an s-valent binding www.nature.com/scientificreports/ for s ∈ N . To retrieve association constants for each molar ratio, the PCCA+ algorithm from Weber 5 is used to coarse grain a transition rate matrix of the microscopic binding rates Q to a matrix Q c containing the (macroscopic) binding rates. The details of how to obtain a dimension reduction via coarse graining is described in 7 . It is important to mention here, that this type of analysis just provides the relative ratios of the rates, but can not figure out the absolute physical unit of them. Thus, providing the k on and k off values with physical units does not make sense here. Following the method of 4 , the bivalent transition rate matrix is For the trivalent case, see the Supplementary File. To get from the transposed transition rate matrix Q T to its coarse grained transposed counterpart Q T c , the PCCA+ algorithm introduces where � := diag(π) with stationary distribution π of the state space S related to Q , and fuzzy membership matrix χ . To retrieve χ , the inner simplex algorithm from 8 is used. By using the interpretation of Weber 9 , the coarse grained matrix yields from which K a can be retrieved.
The transition rate matrix Q requires the ligand conctration [L], which will be calculated iteratively for each injection i as where is the bound ligand concentration for injection i. For an s-valent binding, we denote with the set of free ligand concentrations [L] and, on the right side, the resulting association constants K i for each injection i ∈ {1, . . . , T} . With the Frobenius norm, we then define the Q c fitting as

Extracting the experimental data
To extract the total concentrations L t and M t from the ITC experiments, we follow the supporting information's appendix I of Egawa et al. 10 , where k on 1 and k off 1 for the first microscopic binding, k on 2 and k off 2 for the second consecutive microscopic binding, . . . k on s and k off s for the s-th consecutive microscopic binding, For the bivalent case, the values are and for the trivalent case with j ∈ {2, . . . , 14} . In the following, we will compare the quality between the Wiseman fitting and the proposed Q c fitting.

Comparison of fittings
For the Wiseman fitting, the peaks of the heat curve are integrated with univariate splines of degree 5. The parameters K a , n and H • are obtained by a Nelder-Mead minimization.
For the Q c fitting, the algorithm is implemented as a random search loop with breaking condition when a better norm, compared to the Wiseman fitting, is found. When a suitable norm threshold is reached, a Nelder-Mead minimization is applied on the Q c fitting over parameters n and H • to find the optimal solution to the current microscopic binding rates.
In the following subsection we present the Wiseman fitting to experimental data from Igde et al. 6 . In this work, the authors selected data sets of bi-and trivalent ligands binding to tetrameric Con A with four binding sites from a series of measurements. The precision glycomacromolecule were mannose ligands with oligomer scaffolds. Since we assume a1:1 stoichiometry, the binding process can only have the valency of the respective ligand, no matter if the receptor has morepotential binding sites.

Bivalent ligand case. The bivalent case, where Con A and a bivalent precision glycomacromolecules bind
together, resolves into the ITC plot as depicted in Fig. 1: The heat evolved is extracted by integration over the 14 injection peaks. The Wiseman fitting returns the following binding parameters and results in an error of 5.4838.
For the Q c fitting, the retrieved binding parameters are   Fig. 2(a). Their respective errors are shown in Fig. 2(b). When comparing the binding parameters, the heat of binding H • from the Wiseman fitting is almost double the value from the Q c fitting, whereas the number of sites n is only two thirds.
The expected value for n is 0.2, which is better represented through the Wiseman fitting. The association constant K a is not comparable to the microscopic association constants The latter signal a stronger first and a very weak second binding. As has been said before: Because of the lack of physical units, the analysis just allows for this "relative", qualitative result.

Trivalent ligand case.
We will now examine the ITC plot resulting from a binding between a trivalent Con A and a tetravalent Precision Glycomacromolecule, as depicted in Fig. 3. Compared to the bivalent case, the heat curve resolves in a distinct slope at molar ratio 0.3. As the slope is the most crucial part when determining the binding constants, before using the Frobenius norm, the absolute difference between fitting and q trans is multiplied by a Gaussian curve with mean µ = 0.3 and standard deviation σ = 0.1758 . The first injection is considered an outlier and is excluded from the minimization.
For the Wiseman fitting, the resulting binding parameters are K 1 := k on 1 /k off 1 = 57.5217   Fig. 4(a). Their respective errors are shown in Fig. 4(b). This time the number of sites n is almost identical for both fittings and the heat of binding is fairly equal. For the microscopic association constants, we have which are very low numbers compared to K a from the Wiseman fitting. The second binding appears to be the easiest, whereas the first and the last are less frequent.

Model selection
There is not much freedom of determining the correct model for the interpretation of the experimental data. The Wisemanfitting itself is assumed to be the fixed root model for the interpretation of the ITC data in terms of association and dissociationconstants. The model Q for the detailed kinetic process depends on the chemical nature of the substances. The elementarybinding events (the stoichiometry) only depend on the number of binding sites and ligand sites The only free choice we made is to project the matrix Q to a two-dimensional matrix Q c , because we assume that the onlydominating process is the "slow" overall binding event, which we hope to model correctly and robustly by the coarse grainedmatrix.
The choice of the projection space has been discussed in terms of "implied timescales". It has been analyzed in terms of theseparation of the timescale of the projected process with regard to the next faster process11. The first eigenvalue 1 of theQ c -matrix is always 0, thus this separation of timescales can be estimated from taking the ratio of the second and third eigenvalueof Q c , 2 3 . In case of the bivalent example, this ratio grows from 1.1 in the first titration step to 2 in the last titration step, whichmeans that the second faster process is already two times faster than the slowest one (at the end of the ITC experiment). Thisindicates a good separation and choice of modelling. In the case of the trivalent example, however, this ratio is always at about1.14 and indicates a bad separation.
If we would model Q c as a two-step binding process instead of a one-step binding, then the timescales separation would begiven by the ratio of the third and forth eigenvalues of Q , i.e. 4 3 . In this case this ratio grows from 1.0 in the first titration stepto 4.7 in the last one along the ITC experiment. However, this would not allow for a comparison with regard to the appliedWiseman fitting.

Error Estimates
To test the robustness of this fitting, the ITC input data was disturbed by up to 1%, i.e. every value was multiplied by a randomvariable x i ∈ [0.99, 1.01].
In the bivalent example we have the following changes after disturbing the input: n is 0.33% higher, ΔH decreased by 0.31%and the best norm decreased by 0.18% compared to the undisturbed model. The k on1 , k on2 , k off1 and k off2 differ only by a factorof 10 −6 from the disturbed model binding parameters. Therefore, K 1 and K 2 do not differ either. Thus, we have a stable modelfor the bivalent case.
The trivalent case is more complicated. The binding parameters of the undisturbed and undisturbed models differ up to 743%.K 1 , K 2 and K 3 therefore also differ substantially, up to 653%. However, with the different optimum found in the disturbedsetting, the stoichiometry and the heat of binding are similar. In the disturbed example, n is only 0.04% higher, ΔH decreasedby 1.5% and the best norm decreased by 1.36% compared to the undisturbed model. To summarize, for the trivalent binding,the Q c fitting may not deliver satisfying results. This may be due to the same reason that we discovered in the spectral gapanalysis. Possibly, the Wiseman fitting is not suitable for trivalent bindings, because there are actually three macro states insteadof two. For the Q c fitting, this is an interesting insight, because for trivalent bindings we would need to project onto a 3×3clustered rate matrix instead of a 2×2 matrix.

Conclusion
In this paper we have shown the Q c and the Wiseman fitting for two numerical examples. We conclude that with PCCA+ the fitting of binding parameters of bivalent and trivalent ligands is better in terms of absolute errors than the Wiseman fitting. We demonstrated the results of particular association constants for each subsequent binding step. The major advantage of the proposed method is to gain kinetic information of each binding step rather than for the whole binding process only. However, the binding parameters are just "qualitative" not quantitative. Whereas, kinITC assumes concentration independent binding associations. Our method makes use of the systematic change of the association "constants" along the measurements.

Outlook
By taking microscopic binding and concentrations into account, we have proven the existence of a better fitting of bivalent and trivalent ITC data sets. However, the current algorithm for the Q c fitting utilizes random search to find the binding parameters. An algorithm for an optimal solution is currently not available and remains a problem for future research.
As the resulting fitting relies on a transition matrix and the interpretation of its coarse grained counterpart, the approach of this paper could contribute to a model which is able to explain even more complex heat curves, where the Wiseman graph is no longer satisfactory.
Our paper only shows a proof of concept for a bivalent and a trivalent model. In the future the algorithm can be tested for bindings of higher valencies to learn more about its limitations.
Further, we have assumed a 1:1 stoichiometry that does not necessarily occur in all molecular binding experiments. The theory could therefore be generalized for other stoichiometries as a future piece of research.
Lastly, as a goodness of fit metric we used the fitting error. The two models could be further compared using different criteria, such as the Bayesian information criterion. Apart from that, a parameter study can be of interest to show which of the binding parameters are correlated.

Data availability
The data and the algorithm that support the findings of this study are openly available at https:// github. com/ mazum ba/ si-multi valent-params.